{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# An introduction to solving biological problems with Python\n",
    "\n",
    "## Session 1.4: Loops\n",
    "\n",
    "- [The <tt>for</tt> loop](#The-for-loop)\n",
    "- [The <tt>while</tt> loop](#The-while-loop)\n",
    "- [Skipping and breaking loops](#Skipping-and-breaking-loops)\n",
    "- [Exercises 1.4.1](#Exercises-1.4.1)\n",
    "- [More looping using range and enumerate](#More-looping)\n",
    "- [Filtering in loops](#Filtering-in-loops)\n",
    "- [Exercises 1.4.2](#Exercises-1.4.2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loops"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When an operation needs to be repeated multiple times, for example on all of the items in a list, we \n",
    "avoid having to type (or copy and paste) repetitive code by creating a loop. There are two ways of creating loops in Python, the <tt>for</tt> loop and the <tt>while</tt> loop."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The <tt>for</tt> loop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The for loop in Python iterates over each item in a sequence (such as a list or tuple) in the order that they appear in the sequence. What this means is that a variable (<tt>code</tt> in the below example) is set to each item from the sequence of values in turn, and each time this happens the indented block of code is executed again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "codeList = ['NA06984', 'NA06985', 'NA06986', 'NA06989', 'NA06991']\n",
    "\n",
    "for code in codeList:\n",
    "    print(code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A <tt>for</tt> loop can iterate over the individual characters in a string:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "dnaSequence = 'ATGGTGTTGCC'\n",
    "\n",
    "for base in dnaSequence:\n",
    "    print(base)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And also over the keys of a dictionary: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "rnaMassDict = {\"G\":345.21, \"C\":305.18, \"A\":329.21, \"U\":302.16}\n",
    "\n",
    "for x in rnaMassDict:\n",
    "    print(x, rnaMassDict[x])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Any variables that are defined before the loop can be accessed from inside the loop. So for example to calculate the summation of the items in a list of values we could define the total initially to be zero and add each value to the total in the loop:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "total = 0\n",
    "values = [1, 2, 4, 8, 16]\n",
    "\n",
    "for v in values:\n",
    "    total = total + v\n",
    "    # total += v\n",
    "    print(total)\n",
    "\n",
    "print(total)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Naturally we can combine a <tt>for</tt> loop with an <tt>if</tt> statement, noting that we need two indentation levels, one for the outer loop and another for the conditional blocks:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "geneExpression = {\n",
    "    'Beta-Catenin': 2.5, \n",
    "    'Beta-Actin': 1.7, \n",
    "    'Pax6': 0, \n",
    "    'HoxA2': -3.2\n",
    "}\n",
    "\n",
    "for gene in geneExpression:\n",
    "    if geneExpression.get(gene) < 0:\n",
    "        print(gene, \"is downregulated\")\n",
    "        \n",
    "    elif geneExpression.get(gene) > 0:\n",
    "        print(gene, \"is upregulated\")\n",
    "        \n",
    "    else:\n",
    "        print(\"No change in expression of \", gene)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The <tt>while</tt> loop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to the <tt>for</tt> loop that operates on a collection of items, there is a <tt>while</tt> loop that simply repeats while some statement evaluates to True and stops when it is False. Note that if the tested expression never evaluates to False then you have an “infinite loop”, which is not good.\n",
    "\n",
    "In this example we generate a series of numbers by doubling a value after each iteration, until a limit is reached: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "value = 0.25\n",
    "while value < 8:\n",
    "    value = value * 2\n",
    "    print(value)\n",
    "\n",
    "print(\"final value:\", value)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Whats going on here is that the value is doubled in each iteration and once it gets to 8 the while test fails (8 is not less than 8) and that last value is preserved. Note that if the test were instead value `<= 8` then we would get one more doubling and the value would reach 16."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Skipping and breaking loops"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python has two ways of affecting the flow of the <tt>for</tt> or <tt>while</tt> loop inside the block. The <tt>continue</tt> statement means that the rest of the code in the block is skipped for this particular item in the collection, i.e. jump to the next iteration. In this example negative numbers are left out of a summation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "values = [10, -5, 3, -1, 7]\n",
    "\n",
    "total = 0\n",
    "for v in values:\n",
    "    if v < 0:\n",
    "        continue # Skip this iteration   \n",
    "    total += v\n",
    "\n",
    "print(total)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The other way of affecting a loop is with the <tt>break</tt> statement. In contrast to the <tt>continue</tt> statement, this immediately causes all looping to finish, and execution is resumed at the next statement _after_ the loop."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "geneticCode = {'TAT': 'Tyrosine',  'TAC': 'Tyrosine',\n",
    "               'CAA': 'Glutamine', 'CAG': 'Glutamine',\n",
    "               'TAG': 'STOP'}\n",
    "\n",
    "sequence = ['CAG','TAC','CAA','TAG','TAC','CAG','CAA']\n",
    "\n",
    "for codon in sequence:\n",
    "    if geneticCode[codon] == 'STOP':\n",
    "        break            # Quit looping at this point\n",
    "    else:\n",
    "        print(geneticCode[codon])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Looping gotchas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if you delete the current item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if you insert an item in a sequence before the current item, the current item will be treated again the next time through the loop. This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert-warning\">\n",
    "**When looping, never modify the collection!** Always create a copy of it first.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises 1.4.1\n",
    "\n",
    "1. Create a sequence where each element is an individual base of DNA. Make the sequence 15 bases long.\n",
    "2. Print the length of the sequence.\n",
    "3. Create a for loop to output every base of the sequence on a new line.\n",
    "4. Create a <tt>while</tt> loop similar to the one above that starts at the third base in the sequence and outputs every third base until the 12th."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## More looping"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using range"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you would like to iterate over a numeric sequence then this is possible by combining the `range()` function and a for loop."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print(list(range(10)))\n",
    "\n",
    "print(list(range(5, 10)))\n",
    "\n",
    "print(list(range(0, 10, 3)))\n",
    "\n",
    "print(list(range(7, 2, -2)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looping through ranges "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "for x in range(8):\n",
    "    print(x*x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "squares = []\n",
    "for x in range(8):\n",
    "    s = x*x\n",
    "    squares.append(s)\n",
    "    \n",
    "print(squares)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looping through list indices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "codes = ['NA06984', 'NA06985', 'NA06986', 'NA06989', 'NA06991']\n",
    "\n",
    "for index in range(len(codes)):\n",
    "    print(index, codes[index])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looping through indices for two lists"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "codes      = ['NA06984', 'NA06985', 'NA06986', 'NA06989', 'NA06991']\n",
    "more_codes = ['NA06993', 'NA06994', 'NA06995', 'NA06997', 'NA07000']\n",
    "\n",
    "for index in range(len(codes)):\n",
    "    print(index, codes[index], more_codes[index])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using enumerate"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given a sequence, `enumerate()` allows you to iterate over the sequence generating a tuple containing each value along with a corresponding index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "letters = ['A','C','G','T']\n",
    "for index, letter in enumerate(letters):\n",
    "    print(index, letter)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "numbered_letters = list(enumerate(letters))\n",
    "print(numbered_letters)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Filtering in loops"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "city_pops = {\n",
    "    'London': 8200000,\n",
    "    'Cambridge': 130000,\n",
    "    'Edinburgh': 420000,\n",
    "    'Glasgow': 1200000\n",
    "}\n",
    "\n",
    "big_cities = []\n",
    "for city in city_pops:\n",
    "    if city_pops[city] >= 1000000:\n",
    "         big_cities.append(city)\n",
    "\n",
    "print(big_cities)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "total = 0\n",
    "for city in city_pops:\n",
    "    total += city_pops[city]\n",
    "print(\"total population:\", total)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "pops = list(city_pops.values())\n",
    "print(\"total population:\", sum(pops))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises 1.4.2\n",
    "\n",
    "1. Let's calculate the GC content of a DNA sequence. Use the 15-base sequence you created for the exercises above. Create a variable, `gc`, which we will use to count the number of Gs or Cs in our sequence.\n",
    "2. Create a loop to iterate over the bases in your sequence. If the base is a G or the base is a C, add one to your `gc` variable.\n",
    "3. When the loop is done, divide the number of GC bases by the length of the sequence and multiply by 100 to get the GC percentage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Congratulation! You reached the end of day 1! \n",
    "\n",
    "Go to our next notebook: [Introduction_to_python_day_2_introduction](Introduction_to_python_day_2_introduction.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
